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Summary 

An important problem in particle physics is the detection of a signal against a 
noisy background using limited data. This will arise when processing results from the 
Large Hadron Collider, for example. We discuss a simple probability model for this 
and derive frequentist and non-informative Bayesian procedures for inference about 
the signal, based on the likelihood function. Both procedures are highly accurate in 
realistic cases, with the frequentist procedure having the edge for interval estimation, 
and the Bayesian procedure yielding slightly better point estimates. We also argue that 
the significance, or p-value, function based on the modified likelihood root provides 
a comprehensive presentation of the information in the data and should be used for 
inference. 
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1 Introduction 



The detection of a signal in the presence of background noise is central to particle dis- 
covery in high energy physics, for example using the data to be generated by experiments 
with the Large Hadron Collider. This essent i ally s tatistical top i c has been discussed in- 

and the references 



Fraser et al 



2004 



tensively in the recent literature (jMandelkernl . l2002l : 
therein) and at a series of meetings involving statisticians and physicists; see for example 
|http : //www . physics . ox . ac . uk/phystat05/[ One key issue is the setting of confidence lim- 
its on the underlying signal, based on data from a number of independent channels. In order 
to compare properties of possible signal detection procedures, it was decided at the workshop 
on Statistical Inference Problems in High Energy Physics and Astronomy held at the Banff 
International Research Station in 2006 that one participant would create artificial data that 
should mimic those that might arise when the Large Hadron Collider is running, and that other 
participants would attempt to set confidence limits for the known underlying signal. Thus the 
Banff Challenge (http://newton.hep.upenn.edu/~heinrich/birs/) was born. 

For a single channel the challenge may be stated as follows: the available data yi,y2,y3, are 
assumed to be realisations of independent Poisson random variables with means 7'i/' + (3, f3t, ^u, 
where t,u are known positive constants and the parameters ■i/',/3,7 are unknown. The goal is 
to summarise the evidence concerning tp, large estimates of which will suggest presence of the 
signal. The parameters /3 and 7 are necessary for realism, but their values are only of concern 
to the extent that they impinge on inference for ip. 

This model of course represents a highly idealised version of a statistical problem that will 
arise in dealing with data from the Large Hadron Collider. It is very simple, but important 
statistical issues arise nonetheless: how is evidence about the value of ^/J best summarized? 
How should one deal with the nuisance parameters f3, 7? This second issue is even more 
critical in the case of mu ltiple channe l s, wh ere the number of nuisance parameters is much 
larger. Below we follow 



Fraser et al 



(120041 ) in arguing that the evidence concerning ip is 
best summarised through a so-called significance function, and in ^ describe the general 
construction of significance functions that yield highly accurate frequentist inferences even 
with many nuisance parameters; such a significance function is equivalent to a set of confidence 
intervals at various levels. In S|3]we give results for the Poisson model. 

Statisticians are in broad agreement that the likelihood function is a central quantity 
for inference. Bayesian inference uses the likelihood to update prior information about the 
model parameters, thereby producing a posterior probability density for th ose parameters 
that summarises what it i s reasonable to believe in the light of the data (jJeffrevsl . 



1961 



Forster and Q'Haganl . |200J). This approach is attractive and widely used in applications. 



but scientists using different prior densities may arrive at different conclusions based on the 
same data. One might argue that this is inevitable given the varied points of view held within 
any scientific community, but this lack of uniqueness is awkward when an objective statement 
is sought. One way to unite this multiplicity of possible posterior beliefs is to base inference 
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on a so-called non-informative prior, which we discuss in ^ for the Poisson model described 
above. 

The paper ends with a brief discussion. 



2 Likelihood and significance 



Ther e are many pub 
from 



Brazzale et al. 



ished accounts of modern likelihood theory. The outline below is taken 



(|2007l ). where further references may be found. 
We consider a probability density function /(y; -0, A) that depends on two parameters. The 
interest parameter ip is the focus of the investigation: it may be required to test whether it 
has a specific value ipQ, or to produce a confidence interval for the true but unknown value of 
tjj. Often ip is scalar, and this is the case here: if) represents the signal central to our enquiry. 
The nuisance parameter A is not of direct interest, but must be included for the model to be 
realistic. In the single-channel case the vector A = (/?, 7) represents the values of background 
noise and signal intensity. We let 6 = (^, A) denote the entire parameter vector. 

The log likelihood function is central to the discussion below. It is defined as l!.{6) = 
log f{y;6), and it is maximised by the maximum likelihood estimator 6, which satisfies i{6) > 
i{9) for all 9 lying in the parameter space Og, which we take to be an open subset of M"^. 
We suppose that tp may take values in the interval (■^p-,^/J^), where one or both of the limits 
may be infinite. A natural summary of the support for '0 provided by the combination 
of model and data is the profile log likelihood 

£p(V') = i{e^) = A^) = max£{^P, A), 

A 

where A^ is the value of A that maximises the log likelihood for fixed ip. 

Under regularity conditions on / under which a random sample of size n is generated from 
f{y',Oo), the estimator 6 has an approximate normal distribution with mean a-nd variance 
matrix j{9)^^, where j{9) = —d^i{9)/d6d9^ is the observed information matrix. This result 
can be used as the basis of confidence intervals for ^|JQ, based on the limiting standard normal, 
Af{0, 1) distribution of the Wald pivot ^(-00) = jp(V')^^^(^ ~ "^o)) where 



in which | • | indicates determinant and j\\{0) denotes the (A, A) corner of the observed in- 
formation matrix. In many ways a preferable basis for confidence intervals is the likelihood 
root 

1 1/2 



r(V) = sign(V^ - 0) [2 {^p(^) - £p(?^)} 



which may also be treated as an AA(0, 1) variable. If it is required to test the hypothesis that 
■0 = V'o against the one-sided hypothesis that > ipQ, then the quantities 1 — ${r(';/'o)} and 
1 — ${t(0o)} are treated as significance probabilities, also known as p-values, small values of 
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which will cast doubt on the belief that ip = ipQ. Throughout the paper $ represents the 
cumulative probability function of the standard normal distribution. 

The monotonic decreasing function <I>{r(i/))} is an example of a significance function, from 
which we may draw inferences about tp. An approximate lower confidence bound ipa for -00 is 
the solution to the equation ^{r{tp)} = I — a; the confidence interval {^pa,'^+) should contain 
tpQ with probability 1 — a. An approximate upper bound Vi-q is obtained by solution of 
^{r{ip)} = a, giving confidence interval (•(/'-, V'l-a)) and the two-sided interval (tpaji^i-a) 
will contain ipQ with probability approximately (1 — 2a). Using these so-called first order 
approximations, these one-sided intervals in fact contain -00 with probability 1 — a -|- 0(n~^/^), 
while the two-sided interval contains ipo with probability (1 — 2a) + 0(n~^). Significance 
functions may be based on the Wald pivot t{ip) or on related quantities involving the log 
likelihood derivative dl/dip, which also have approximate A/'(0, 1) distributions for large n, 
but the intervals based on r{ip) are preferable because they always yield subsets of (t/^-, ■0+) as 
confidence sets. Further, they are invariant to invertible interest-preserving reparameterization, 
of the form (tp, A) i— s- {g{ip), h{X, ■0)), in the sense that if T is a confidence interval for -0 in the 
original parametrization, then g{2) is the corresponding interval in the new parametrization; 
this property is not possessed by intervals based on the Wald pivot, for example. 

Improved inferences may be obtained through significance functions based on the modified 
likelihood root 



r*{ijj) = r{ijj) + 



1 



r(0) ^ \ f{'>P) 



log 



where 









m 




MO) 


1 


u 


i\\{9ip) 



1/2 



(1) 



(2) 



is determined by a local exponential family approximation whose canonical parameter (p{6) is 
described below, and tpg denotes the matrix d(p/d9'^ of partial derivatives. The numerator of 
the first term of ([2]) is the determinant of a d x d matrix whose first column is v'(^) ~ '/^(^v) 
and whose remaining columns are ^px{9^). For continuous variables, one-sided confidence 
intervals based on the significance function ^{{r*{ip)} have coverage error 0(n~^/^) rather 
than 0{n-^/^). 

For a sample of independent continuous observations yi, . . . ,yn, we define 



dii9;y) 



k=l 



dyk 



y=y 



where y^ denotes the observed data, and Vi, . . . ,Vn is a set of 1 x d vectors that depend on 
the observed data alone. If the observations are discrete, then the theoretical accuracy of the 
approximations is reduced to 0(n~^), and the interpretation of signific ance functions s uch a s 
<^{r*(0)} changes slightly. In the discrete setting of this paper we take (jPavison et alV 120061 ) 

9E(yfc;6 



89"^ 



(3) 
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An important special case is that of a log likelihood with independent contributions of curved 
exponential family form, 

n 

£(0) = J^{afc(%fc-cfc(e)}, (4) 

k=l 

where ak{6)yk denotes scalar product. In this case 

n 

^{6)^ = Y,Mo)yk. (5) 

k=l 

Inference using ([1]) is easily performed. If functions are available to compute i{9) and ^{0)^ 
then the maximisations needed to obtain 9 and 6^ and the differentiation needed to compute 
([2]) may be performed numerically. 



3 Likelihood inference 
3.1 Model formulation 

Under the proposed model, the observation for the kih. channel is assumed to be a realisation 
of Yfc = (life, l2fc, ^sfc), where the three components are independent Poisson variables with 
respective means (7fe "0 + /?fe, /3fe ifc, 7fc Ufe), for i = 1, . . . , n. Here Yife represents the main mea- 
surement, and Y^k are respectively subsidiary background and acceptance measurements, 
and tk and Uk are known positive constants. 

The signal parameter -0 is of interest, and (/3i, 71, . . . , /3„, 7„) is treated as a nuisance 
parameter. In principle all these parameters should be non-negative, but it is mathematically 
reasonable to entertain negative values for ijj, provided -0 > niaxfcj— /3fc/7fc}. Below we use this 
extended parameter space for numerical purposes, but restrict interpretation of t he results to 



the set of physically meaningful values ■0 > 0, as suggested bv lFraser et al\ ([200J). 

For computational purposes we take A = (An, A21, . . . , Ai„, \2n), with (Aife, A2fc) = (log (i^ — 
log 7fc, log /3fc), so that exp(Aife) > —-0 and A2fe G M, i = 1, . . . , n. The invariance properties 
outlined in the previous section mean that inferences on ijj are unaffected by this reparameter- 
ization. 

The log likelihood function for 9 = (0, A) has curved exponential family form @ with 

ak{9)^ = {log (V^e^2fe-Ai. + gA^fe^ ^ ^^^^ (^^^ _ ^^^)| ^ (g) 

Vk = iyik,y2k,y3k) , 

Ck{e) = (V' + Ufc)e^2fc-Ai, + (i + i^)gA2, _ 

In general, 9 and 9^ must be computed numerically. It is convenient to compute 6^ first, and 
then obtain 9 by maximising the profile log likelihood l{9.^). 

The dimension of the nuisance parameter in this model may be reduced by a conditioning 
argument that applies to Poisson response data, but for simplicity of exposition we use the 
Poisson formulation here. The trinomial model that emerges from the conditioning is used 
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Figure 1: Inferential summaries for the illustrative single channel data. Left panel: profile 
relative log likelihood £p{ip) -^p(^) (dashes), -r*{'4))'^ /2 (solid), and -r^(V')^/2 (dots). Right 
panel: <I>{r(^)} (dashes), ^{r*{ijj)} (solid) and <I>{r|j (-;/')} (dots). Horizontal lines are at values 
0.99, 0.01, and 0.5, and give respectively the lower and upper bounds of a confidence interval of 
level 0.98, and a median unbiased estimate of ^. The intersection of the significance function 
with the vertical line aX ip = Q leads to a p- value for testing the hypothesis ^ = against the 
one-sided hypothesis ^ > 0. 

below in §4.21 Properties of the Poisson model imply that numerical results from the two 
formulations are identical. 



3.2 One channel 

When only one channel is available, that is, n = 1, the log likelihood has full exponential form, 
that is, the number of observations equals the number of parameters. The canonical parameter 
99(0) given by ([6]) is then equivalent to ^ in the sense that any affine transformation of the 
canonical parameter gives the same q{ip) in ([2]) and the same inference for if). 

A standard way to summarize the evidence conc erning ij) is to pre sent the profile log 



likelihood ip{ip) and the significance function <I>{r(^/^)} (jFraser et al.l . |2004| ). but, as mentioned 
above, more accurate inferences are obtained from the modified likelihood root, r*{'ip). As the 
profile log likelihood equals —r{ip)'^/2, the quantity —r*{ip)'^/2 can be regarded as the adjusted 
profile log likelihood corresponding to the significance function ^{r*(ip)}. 

For illustration we consider data with yi = 1, 1/2 = 8, 1/3 = 14 and t = 27, n = 80, for 
which Figure [J shows the profile and the adjusted profile log likelihoods and the corresponding 
significance functions; the construction of our Bayesian solution r'^iip) is explained in ^ The 
maximum likelihood estimate, ip = 4.021, may be determined from the significance function 
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as the solution to the equation ^{r{ip)} = 0.5. The analogous estimate obtained using the 
modified likelihood root, the median unbiased estimate ip* = 4.966, satisfies ^{r* (ip*)} = 0.5. 
The corresponding estimator has equal probabilities of falling to the left or to the right of 
the true parameter value, a property preferable to classical unbiasedness because it does not 
depend on the parameterization. 

One minus the value of the significance function at = gives the significance probability 
for testing the presence of a signal, namely the p- value for testing the hypothesis 1^ = against 
the one-sided hypothesis V > 0. In the present example, <I>{r(0)} = 0.837 and ${r*(0)} = 
0.873, thus giving values respectively equal to 0.163 and 0.127, both weak evidence of a 
positive signal. This is hardly surprising, as yi = 1: just one event has been observed. 

As explained in ^ the significance function provides lower and upper bounds for any 
desired confidence level. Figure [1] indicates the choice of lower and upper bounds for level 0.99. 
In particular, for the modified likelihood root, we get ^{r*{'ipQQi)} = 0.99 and <I>{r*(V'o 99)} = 
0.01, with ^0.99 ~ —2.603 and V'o.oi — 36.519. It is possible for these limits to be negative, 
as happens in the present case for the lower bound. In such instances, we take as a limit the 
maximum max(^* , 0) of the actual limit, ijj^, and the lower physically admissible value of zero. 
The fact that the lower bound is zero in this case is coherent with the p-value for testing a 
positive signal. In fact, a right-tail confidence interval of level 0.99 in this case contains all 
possible parameter values, also including 0; thus it is [0,-|-oo). A left-tail confidence interval 
is [0,36.510), although such intervals are not well suited to claim the presence of signal, given 
the meaning of confidence intervals. The analogous limits obtained using the likelihood root 
r(^) are ^0.99 = —2.644 and V'0.01 = 33.835. 

In extreme situations confidence limits at any standard choice of a may be negative, thus 
giving confidence intervals including only the valuejA = 0;_We see this feature of the method 



as a perfectly sensible frequentist answer (see also ICoxl . |2006| . Example 3.7). In such instances 
the value for testing xjj = against the alternative V' > would be very close to 1, thus 
strongly suggesting that there is no positive signal. However, the fact that no physically 
realistic parameter value is supported by the observed data also casts doubt on the model. 

In the Banff Challenge only coverage of left-tail confidence intervals (upper bounds) was 
tested, though we regard p-values and lower bounds as more appropriate for inference on ip. 
Figure [2] shows the coverage of 0.90 and 0.99 confidence limits for a set of 39,700 simulated 
datasets with large variability in the values of the nuisance parameters. The coverage is very 
good, with only minor undercoverage in the 0.99 upper bounds when the parameter ip is small. 
Similar results were obtained for another set of simulated datasets, with lower variability in the 
nuisance parameters. We also performed some simulation studies, and found that the method 
typically performed very well. Table [1] displays results in the worst scenario that we found. 
Apart from some minor issues in the right tail, r* performs extremely well. 

In some boundary cases with yi = it is impossible to compute the quantities needed for 
([2]). In these rare cases we replaced r*{ip) with r(V'). 
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Single channel, 90% 



Single channel, 99% 



~n 1 — 

10 15 
V 



~~1 — 
20 




Figure 2: Coverages of 0.90 (left panel) and 0.99 (right panel) upper bounds from 39,700 
simulated datasets from a single channel, with large uncertainty in the nuisance parameters, 
from the Banff Challenge. The solid and dashed lines correspond respectively to r*{tp) and 
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Table 1: Coverage probabilities in a single channel simulation with 10, 000 replications, "0 = 1) 
log/3 = 1.1, log 7 = 0, t = 33 and u = 100. Figures in bold differ from the nominal level by 
more than simulation error. 



3.3 Several channels 

Our approach extends easily to multiple channels. When there are n > 1 channels, the nui- 
sance parameters (Aifc, A2fc) are channel-specific, so the profile log likelihood is simply the sum 
of profile log likelihood contributions for the individual channels, which is then maximised 
numerically to get the overall estimate 6 = {ijj, A). 
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Table 2: Simulated multiple-channel data. 

The remaining ingredient needed to compute the modified likelihood root r*{ip) is the 
2n + 1 dimensional canonical parameter ip{9), which can be obtained using ^ and The 
first element of ip{9) is 

n ^ ^ 
fc=l 

and the 2n other elements are 

e^2fe log (^^e^2fe-Aife _^ gAafe^ ^ tjA2fce^2fe ^ = 1, . . . , n. 

Any affine transformation of (p{6) would give the same modified likelihood root. 

Figure [3] gives the profile and adjusted profile log likelihoods for ip and the corresponding 
significance functions for an illustrative dataset with ra = 10 channels shown in Table [2j The 
interpretation of these plots is the same as for Figure [TJ The modified likelihood root gives a 
p- value of 7.709 x 10~^ for testing the presence of a signal , whereas that based on the likelihood 
root is 3.124 x 10^^. The estimates are V'* = 11-682 and ip = 11.487 and the lower and upper 
bounds are V'o.gg ~ 4.572, V'o.oi ~ 23.191 and V'o.99 = 4.496, V'o.oi ~ 22.907. There is some 
evidence of a positive signal from these data, though the modified likelihood root r*{tl)) gives 
weaker support than does the ordinary likelihood root r('i/'). 

Boundary samples also arise in the multiple-channel case, though more infrequently than 
with a single channel. In such cases we again used the likelihood root r['tp) for inference on -0- 

Figure m shows coverages of the 0.90 and 0.99 left-tail confidence intervals (upper bounds) 
computed with the modified likelihood root from 70,000 simulated datasets with n = 10 from 
the Banff Challenge. Our approach seems to perform satisfactorily even with as many as 20 
nuisance parameters, though there is again some undercoverage for small values of Table [3] 
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Figure 3: Inferential summaries for the simulated multiple-channel data in Table [2j For details, 
see caption to Figure [TJ 




Figure 4: Coverages of 0.90 (left panel) and 0.99 (right panel) upper bounds from 70, 000 
simulated multiple-channel datasets from the Banff Challenge. The solid and dashed lines 
correspond respectively to r*{ip) and T'^{tp)- 



reports coverage probabilities for limits at various confidence levels for a simulation performed 
with tp = 2. The results for the modified likelihood root are always within simulation error of 
the nominal levels, thus giving very accurate inference for ip. 
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Table 3: Coverage probabilities in a multiple-channel simulation with 10, 000 replications, 
= 2, /3=(0.20, 0.30, 0.40, 1.10), 7=(0.20, 0.25, 0.30, 0.65), t=(15, 17, 19, ...,33) 
and ii=(50, 55, 60, . . . , 95). Figures in bold differ from the nominal level by more than 
simulation error. 

4 Bayesian inference 
4.1 Non-informative priors 

There is a close link between the modified likelihood root and analytical approximations useful 
for Bayesian inference. Suppose that posterior inference is required for ^ and that the chosen 
prior density is 7r(V',A). Then it turns out that replacing ([2]) with 

1/2 

I I 



Jxx{9) 



in formula ([T]), where i'^ is the derivative of ^p(V') with respect to if)^ leads to a Laplace-type 
approximation to the marginal posterior distribution for that we will denote by r'^i'ip)- This 
may be used to include prior information in the inferential process, but as mentioned above, 
the choice of prior density can be vexing. In this section we discuss non-informative Bayesian 
inference for ip. 

For models with sc al ar ip and a nuisance parameter ^ that is orthogonal to ip in the sense 



Cox and Reid 



(|l987l ). iTibshiranil (|l989l ) shows that up to a certain degree of approximation. 



of 

a prior density that is non-informative about ip is proportional to 



(7) 



where iipip{ip,S,) denotes the element of the Fisher information matrix, and g{(,) is an 

arbitrary positive function that satisfies mild regularity conditions. Under fu rther mild condi- 
tions (171) is a Jeffr e ys pr ior for ip, and it is also a matching prior: following 



Welch and Peers 



Reid et al. 



(j2002l ) show how ^ yields (1 — a) one-sided Bayesian posterior confidence 
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intervals that contain ijj with probabihty (1 — a) + 0(n~^) in a frequentist sense. Unfortunately 
([7]) requires one to express the model in terms of an orthogonal parametrization, and this may 
be impossible. Below we rewrite it in terms of an arbitrary parametrisation. 

Suppose therefore that the model is parametrized in terms of a scalar interest parameter 
tp and a column vector nuisance parameter C, = C(V')0> with the log likelihood written as 
£*{'!/', C(V' J 0} — ^(V')C)- Then the elements of the Fisher information matrices in the two 
parametrizations are related by the equations 

where i^^ = E{-d'^i/dCdip'^), i^^ = E{-d'^t /dCdC), = dC/dip, and so forth. Parameter 
orthogonality implies that i^^ = 0, so provided Q is not identically zero, ^ = ^{ip,C) is 
determined by the partial differential equation 

which always has a set of solutions. On substituting ([9]) into the first expression in ([8]), we find 
that in terms of the original parametrization the required element of the Fisher information 
matrix may be written as 

whence the non-informative prior ^ may be written as 

which requires that the o rthogonal parame ter ^ be expressed in terms of the original parame- 



ters; cf. expression (5) of iTibshiranil (|l989l ). In the next section we derive (jlOp for the single- 



and multiple-channel models of 



4.2 Application to Poisson model 



The single-channel model may be reparametrized in terms of "0, 7 and C, = /3/7, in which case 
Yi, 12)^3 are independent Poisson variables with means 7(V' + C)i C7*) 7^- This implies that the 
trinomial density of (Yi, ¥2-, I3) conditional on the total S = Yi + Y2 + Y2, does not depend on 
7, and there is no loss of informat ion on ip and C if we base inference on the trinomial or more 



generally the multinomial model ([Barndorff-Nielsenl . Il978l . Ch. 10). In particular, frequentist 



inferences on tp based on the original model or on the conditional trinomial model lead to 
exactly the same results. Here C, is scalar. Apart from additive constants, the corresponding 
log likelihood is 

^* (V', C) = yi iog(V' + C) + ^2 log C - s iog(^ + C + + Ct) , + C, C > 0, 

and E(yi \ S = s) = sitp + C)/vr, ^{¥2 \ S = s) = stC/ir, where -K = 7p + C + u + Ct- Thus in 
this parametrization the Fisher information matrix for the trinomial model has form 

, .X ^ , u + Ct u-ipt 

I vPX) 
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and the orthogonal parameter is a solution of the equation 



= Cii^t - u)/{iPt{iP + u)+ Cu{l + t)}, 

such as 

C{ip, C) = nog C + iog(C + V) - (1 + i) log(V' + C + u + Ct). 

It is impossible to express C explicitly as a function of and ^, and hence to use the non- 
informative prior in the form ([7]), but (llOh is readily obtained, and after a little algebra turns 
out to be proportional to 



iptjip + u) + Cu{l + t) 

CHC + i^yi'^ + c + u + ct)3j " I (V' + c + n + a)^ 

for an arbitrary but smooth and positive function g. 

If data {uik, y2k, ysk, tk, Uk) are available for n independent channels, then the conditioning 
argument above yields n independent trinomial distributions for (yifc, y2fc) ysfc) conditional on 
the Sk = yik + y2k + ysk, whose probabilities depend on the parameters tp^k- Apart from an 
additive constant the log likelihood is 

n 

r(V', Cl, • • • , Cn) = X] "L^/lfc log(V' + Cfe) + y2k logCfc - Sk log(V' + Ck + Uk + Cktk)} , Cl, • • • , Cn > 0, 
k=l 

where > — min(Ci, . . . , Calculations like those leading to pT]) reveal that the non- 
informative prior for ip is proportional to 

1/2 n 



SktkUk ' ,,TT i^i'^ + Uk)tk + CkUki'^ + tk) 



n 



Cfe (Cfe + tP){Ck + + Uk + Cktk) ' 



(12) 

times an arbitrary function of the quantities 

Ckii^, Cfe) = tk log Ck + log(Cfc + V') - (1 + ifc) log(V' + Cfe + Wfc + Cfcife), k = l,...,n. 

Although (I12p depends on the data through si, . . . , Sn, these are constants under the trinomial 
model, as are the and Uk under both Poisson and trinomial models. The presence of SktkUk 
in the first term of (jl2p has the heuristic explanation that a channel for which this product is 
large will contain more information about the corresponding parameters. 



4.3 Numerical results 

We first consider the single-channel data analyzed in §3.21 with yi = 1, 2/2 = 8, 1/3 = 14, 
and t = 27, u = 80. The dotted lines in Figure [1] show the approximate posterior function, 
—r*^{ip)'^/2, and the corresponding significance function obtained using the non-informative 
prior (llip . with g taken to be a constant function. 

Typically the prior density yields larger lower bounds and smaller upper bounds than those 
obtained from the frequentist solution, because the effect of the prior is to inject information 
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about the parameter of interest. In the present case, the estimate ■0^ = 4.9182, which sat- 
isfies ^{r^itp^)} = 0.5, is smaller than the corresponding estimate obtained using r*('ip), 
and the 0.99 lower and upper bounds are respectively given by '&{?'^(V'b-o oi)} ~ ^-^^ 
^{^b(V's;o.99)} = 0-01, with 0^.0 99 = -1.820 and Vb;o.oi = 35.094. 

The p-value for testing the hypothesis V = against the one-sided hypothesis ^ > is 
equal to 1 — ${r^(0)} = 0.1063, which is again a weak evidence of a positive signal. 

The coverage properties of the non-informative Bayesian solution are similar to but not 
quite so good as those of the frequentist solution, as shown in Figure [2] and by the simulation 
results reported in the last column of Table [H 

Similar behavior is seen in the multi-channel case. Figure [3] shows the approximate posterior 
function, —r*^{ip)'^/2, and the corresponding significance function obtained using the non- 
informative prior (|12p times a constant function of £,kiipXk), k = l,...,n, for the data in 
Table [21 The approximate Bayesian solution gives a p-value of 4.865 x 10"^ for testing the 
presence of a signal, smaller than that obtained from the frequentist solutions in N3.3i The 
estimate is ■0^ — 11-632 and the lower and upper bounds are ^s-o 99 — 4.699 and ^^.q 01 — 
23.030. There is stronger evidence of a positive signal from this approach than from the 
modified likelihood root r*('tp) and the ordinary likelihood root r('i/'). However, simulation 
results reported in Figure H] and Table [3] show that the coverage of confidence sets based on the 
approximate Bayesian solution is not quite so good as for sets based on the modified likelihood 
root. 

5 Discussion 

We proposed procedures based on modern likelihood theory for detecting a signal in the pres- 
ence of background noise, using a simple statistical model. We suggest the use of the sig- 
nificance function based on the modified likelihood root as a comprehensive summary of the 
information for the parameter given the model and the observed data, from which p-values 
and one- or two-sided confidence limits can be obtained directly. 

Even in cases where there are 20 nuisance parameters, our frequentist procedure appears 
to give essentially exact inferences for the signal parameter ip. Its non-informative Bayesian 
counterpart performs slightly worse in terms of coverage of confidence intervals and levels for 
tests, but provides slightly better point estimates as solutions to the equation ^{r'^{^p)} = 0.5, 
analogous to median unbiased estimates. The most serious departures from the correct coverage 
are for small values of ^|J, corresponding to weak signals, and arise because in such cases very 
low counts yi corresponding to the observed signal are quite likely to arise. The case of a 
weak signal seems to be of little practical interest, because in such cases no strong significance 
can be obtained. Although the Banff challenge concerned significance at the 90% and 99% 
levels, both general theory and the accuracy of our results suggest that similar precision can 
be expected for much more extreme significance levels. 

If yi = our higher order approaches break down, though a closely related first order 
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inference is available. In such cas es it is tempting to replace yi by yi + c, where c is a 
small positive quantity. iFirthI ()1993l ) investigates under what circumstances this modification 
yields an improved estimate of the interest parameter in exponential family models, taken on 
the canonic al scale of t he ex ponential family. Our model is not a linear exponential family, 
but ideas of iKosmidisI (|2007l ) might be used to choose c to yield an improved estimate of ip. 



Our main interest is in confidence intervals and tests, however, and since Firth's correction 
corresponds to use of a default Jeffreys' prior and we have found that use of a non-informative 
prior does not improve coverage properties of our method, one should not be optimistic about 
the effect of Firth's correction in our context. 

In some instances the method may lead to empty confidence intervals or intervals including 
only the value ^ = 0. From a frequentist perspective this is not a crucial problem. On the one 
hand, even in such extreme samples the confidence function would yield a p- value to test for the 
presence of a signal, and on the other hand, the concentration of the likelihood and significance 
functions in a region of physically meaningless values of the parameter might suggest that the 
model is inappropriate. 
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